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ABSTRACT 
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CHAPTER  I 


ON  HYBRID  APPROACHES  TO  PATTERN  RECOGNITION 
1_.  1_  Introduction 

There  are  many  methods  proposed  for  designing  a  pattern  recognition 
system.  These  methods  can  primarily  be  grouped  into  two  major  approaches; 
namely,  decision-theoretic  or  discriminant  approach  El-9],  and  syntactic  or 
structural  approach  CIO-12].  From  a  more  general  viewpoint,  these  ap¬ 
proaches  can  be  discussed  within  the  same  framework  in  terms  of  pattern 
representation  and  decision-making  (based  on  a  given  pattern  representa¬ 
tion).  A  block  diagram  of  a  pattern  recognition  system,  based  on  this  gen¬ 
eral  point  of  view  is  given  in  Figure  1.  The  subproblem  of  pattern 
representation  involves  primarily  the  selection  of  representation.  The  sub¬ 
problem  of  decision-making  involves  primarily  the  selection  of  decision  cri¬ 
terion  or  similarity  measure.  Other  approaches  include  template-matching 
Cl 3],  problem-solving  models  CIA],  category  theory  Cl 5]  and  relation  theory 
Cl  6]. 

In  the  template-matching  approach,  a  set  of  templates  or  prototypes, 
one  for  each  pattern  class,  is  stored  in  the  machine.  The  input  pattern 
with  unknown  classification  is  matched  or  compared  with  the  template  of  each 
class,  and  the  classification  is  based  on  a  preselected  matching  criterion 
or  similarity  measure  (e.g.,  correlation).  In  other  words,  if  the  input 
pattern  matches  the  template  of  ith  pattern  class  better  than  it  matches  any 
other  templates,  then  the  input  pattern  is  classified  as  from  the  ith  pat¬ 
tern  class.  Usually,  for  the  simplicity  of  the  machine,  input  patterns  and 
the  templates  are  represented  in  their  raw-data  form,  and  the  decision¬ 
making  process  is  nothing  but  matching  the  unknown  input  to  each  template. 
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The  template-matching  approach  has  been  used  in  some  existing  printed- 
character  recognizers  and  bank-check  readers  C13,19D.  The  disadvantage  of 
this  approach  is  that  it  is  sometimes  difficult  to  select  a  good  template 
for  each  pattern  class,  and  to  define  an  appropriate  matching  criterion. 
This  difficulty  is  especially  remarkable  when  large  variations  and  distor¬ 
tions  are  expected  in  the  patterns  under  study.  Recently,  the  use  of  flexi¬ 
ble  template-matching  or  "rubber  mask"  techniques  has  been  proposed  Cl  7], 

JL  2.  Decision-Theoretic  Approach 

In  the  decision-theoretic  approach,  a  pattern  is  represented  by  a  set 
of  N  features  or  an  N-dimensional  feature  vector,  and  the  decision-making 
process  is  based  on  a  similarity  measure  which,  in  turn,  is  expressed  in 
terms  of  a  distance  measure  or  a  discriminant  function.  In  order  to  take 
noise  and  distortions  into  consideration,  statistical  and  fuzzy-set  methods 
have  been  proposed  [50].  The  characterization  of  each  pattern  class  could 
be  in  terms  of  an  N-dimensional  class-conditional  probability  density  func¬ 
tion  or  a  fuzzy  set,  and  the  classification  (decision-making)  of  patterns  is 
then  based  on  a  (parametric  or  nonparametr i c)  statistical  decision  rule  or 
(fuzzy)  membership  function.  A  block  diagram  of  a  decision-theoretic  pat¬ 
tern  recognition  system  is  given  in  Figure  2. 

It  should  be  noted  that  the  template-matching  approach  could  be  regard¬ 
ed  as  a  special  case  of  the  decision-theoretic  approach.  In  such  a  case, 
each  pattern  is  represented  by  a  feature  vector,  and  the  decision-making 
process  is  based  on  a  simple  similarity  (matching)  criterion  such  as  the  use 
of  correlation. 

Applications  of  decision-theoretic  pattern  recognition  include  charac¬ 
ter  recognition  Cl 3, 1 8, 1 93 ,  biomedical  data  analysis  and  diagnostic 


decision-making  £20-22 ],  remote  sensing  £18,23],  target  detection  and  iden¬ 
tification  £3,243,  failure  analysis  and  diagnosis  of  engineering  systems 
£25,26],  machine  parts  recognition  and  inspection  in  the  automation  of 
manufacturing  processes  £27-30],  processing  of  seismic  waves  £24],  modeling 
of  socio-economic  systems  £31],  and  archaeology  (classification  of  ancient 
objects)  £32]. 

1_.][  Syntactic  Approach 

In  the  syntactic  approach,  a  pattern  is  represented  as  a  string,  a  tree 
or  a  graph  of  pattern  primitives  and  their  relations.  The  decision-making 
process  is  in  general  a  syntax  analysis  or  parsing  procedure.  Special  cases 
include  the  use  of  similarity  (or  distance)  measures  between  two  strings, 
two  trees,  or  two  graphs  £33].  A  block  diagram  of  a  syntactic  pattern 
recognition  system  is  given  in  Figure  3. 

Conventional  parsing  requires  an  exact  match  between  the  unknown  input 
sentence  and  a  sentence  generated  by  the  pattern  grammar.  Such  a  rigid  re¬ 
quirement  often  limits  the  applicability  of  the  syntactic  approach  to 
noise-free  or  artificial  patterns.  Recently,  the  concept  of  similarity 
measure  between  two  sentences  and  between  one  sentence  and  a  language  has 
been  developed.  Parsing  can  be  performed  using  a  selected  similarity  (a 
distance  measure  or  a  likelihood  function),  and  an  exact  match  becomes  un¬ 
necessary.  Such  a  parsing  procedure  is  called  "error-correcting"  parsing 
£34]. 

It  should  be  noted  that  the  template-matching  approach  could  also  be 
regarded  as  a  special  case  of  the  syntactic  approach.  In  such  a  case,  each 
pattern  is  represented  by  a  string  (or  tree,  or  graph)  of  primitives  and  the 
decision-making  process  is  based  on  a  similarity  or  distance  measure  between 
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two  strings  (or  two  trees,  or  two  graphs). 


Applications  of  syntactic  pattern  recognition  include  character  recog¬ 
nition  C35-37],  waveform  analysis  [36,38,393,  speech  recognition  [36,40], 
automatic  inspection  [41,42],  fingerprint  classification  and  identification 
[36,43],  geological  data  processing  [44],  target  recognition  [45],  machine 
part  recognition  [36,46]  and  remote  sensing  [36], 

There  are  at  least  four  ways  to  mix  the  decision-theoretic  approach  and 
the  syntactic  approach.  They  are:  (i>  decision-theoretic  followed  by  syn¬ 
tactic  approach,  (ii)  use  of  stochastic  languages,  (iii)  stochastic  error- 
correcting  syntax  analysis,  and  (iv)  matching  of  stochastic  graphs.  In  the 
following  sections,  we  briefly  describe  each  of  these  mixed  approaches. 

1_. f*_  Decision-theoretic  followed  by  syntactic  approach 

In  this  approach,  pattern  primitives  are  recognized  by  a  decision- 
theoretic  method  and  pattern  structures  are  analyzed  by  a  syntactic  method. 
For  example,  in  speech  recognition,  speech  wave  segments  can  be  recognized 
by  a  decision-theoretic  method.  Strings  of  these  segments,  characterized  by 
a  set  of  syntax  rules,  provide  the  final  description  of  continuous  speech 
waveforms  [18,36,47],  Similarly,  such  a  hybrid  approach  can  be  used  for  EEG 
analysis  [39].  In  LANDSAT  data  interpretation,  each  pixel  in  a  LANDSAT  im¬ 
age  can  be  classified  by  a  decision-theoretic  method  (e.g.,  the  maximum- 
likelihood  classification  rule)  on  the  basis  of  the  four-band  spectral  meas¬ 
urement.  Structural  (or  spatial)  relations  among  various  pixels  can  be 
described  by  a  syntactic  method.  Specifically,  the  structure  of  highways 
(or  rivers)  can  be  represented  by  trees  with  "concrete-like"  or  water  pixels 
and  character i zed  by  a  tree  grammar.  Consequently,  the  recognition  of  high¬ 
ways  from  all  concrete-like  pixels  can  be  easily  accomplished  by  using  a 
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tree  automaton  [18,483.  The  recognition  of  rivers  from  all  the  pixels  clas¬ 
sified  as  water  can  be  similarly  performed. 

Recently,  a  shape  recognition  procedure  with  two  types  of  primitive  has 
been  proposed  C493.  The  two  primitives,  curve  primitive  and  angle  primi¬ 
tive,  are  described  by  attributes  and  recognized  by  a  decision-theoretic 
method.  Strings  of  curve  and  angle  primitives  are  used  to  represent  the 
outer  boundaries  of  an  object  with  different  starting  points,  and  are 
characterized  by  a  set  of  attributed  syntax  rules.  Recognition  of  object 
shapes  is  accomplished  by  parsing  the  strings  describing  object  boundaries 
with  respect  to  the  syntax  rules.  The  structural  or  syntactic  information 
contained  in  the  syntax  rules  is,  in  fact,  used  to  improve  the  primitive 
recognition  accuracy.  In  other  words,  primitive  recognition  and  structural 
analysis  (or  parsing)  are  carried  out  in  one  stage  rather  than  one  following 
the  other  in  two  separate  stages.  With  the  addition  of  error-correcting 
technique  to  the  attributed  shape  grammar,  such  a  hybrid  approach  can  be 
used  for  recognition  of  distorted  and  partial  shapes  [653. 

1*5.  Use  of  stochastic  languages 

In  order  to  describe  noisy  and  distorted  patterns  under  ambiguous  si¬ 
tuations,  the  use  of  stochastic  languages  has  been  suggested  [103.  With  the 
probabilities  associated  with  grammar  rules,  a  stochastic  grammar  generates 
sentences  with  a  probability  distribution.  The  probability  distribution  of 
the  sentences  can  be  used  to  model  the  noisy  situations. 

A  stochastic  grammar  is  a  four-tuple  Gg  =  (VN,V.j.,Ps,S)  where  VN  is  a 
finite  set  of  nonterminals,  VT  is  a  finite  set  of  terminals,  S  e  VN  is  the 
start  symbol,  and  Ps  is  a  finite  set  of  stochastic  productions.  For  a  sto¬ 
chastic  context-free  grammar,  a  production  in  Ps  is  of  the  form 
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pi  j 

Ai  *  V  Ai  e  V  aj  e  <vnU  V* 

where  p.j  is  called  the  production  probability.  The  probability  of  generat¬ 
ing  a  string  x,  called  the  string  probability  p(x),  is  the  product  of  all 
production  probabilities  associated  with  the  productions  used  in  the  genera¬ 
tion  of  x.  The  language  generated  by  a  stochastic  grammar  consists  of  the 
strings  generat«.J  by  the  grammar  and  their  associated  string  probabilities. 

By  associating  probabilities  with  the  strings,  we  can  impose  a  proba¬ 
bilistic  structure  on  the  language  to  describe  noisy  patterns.  The  proba¬ 
bility  distribution  characterizing  the  patterns  in  a  class  can  be  interpret¬ 
ed  as  the  probability  distribution  associated  with  the  strings  in  a 
language.  Thus,  statistical  decision  rules  can  be  applied  to  the  classifi¬ 
cation  of  a  pattern  under  ambiguous  situations  (for  example,  use  the 
maxi mum- likelihood  or  Bayes  decision  rule).  A  block  diagram  of  such  a 
recognition  system  using  maximum-likelihood  decision  rule  is  shown  in  Figure 
4.  For  a  given  stochastic  finite-state  grammar  Gg,  we  can  construct  a  sto¬ 
chastic  finite-state  automaton  to  recognize  only  the  language  L(Gs>  [103. 
For  stochastic  context-free  language,  stochastic  syntax  analysis  procedures 
are  in  general  required.  Because  of  the  availability  of  the  information 
about  production  probabilities,  the  speed  of  syntactic  analysis  can  be  im¬ 
proved  through  the  use  of  this  information.  Of  course,  in  practice,  the 
production  probabilities  will  have  to  be  inferred  from  the  observation  of  a 
relatively  large  number  of  pattern  samples.  When  the  imprecision  and  uncer¬ 
tainly  involving  in  the  pattern  description  can  be  modeled  by  using  the  fuz¬ 
zy  set  theory,  the  use  of  fuzzy  language  for  syntactic  pattern  recognition 
has  recently  been  suggested  C503. 


1_.6^  Stochastic  Error-Correcting  Syntax  Analysis 

Recently,  error-correcting  syntax  analysis  has  been  proposed  for  the 
recognition  of  noisy  and  distorted  patterns  C33,51D.  Referring  to  Fig.  3,  a 
segmentation  error  can  be  represented  by  a  deletion  or  insertion  of  a  primi¬ 
tive  in  a  sentence.  A  primitive  recognition  error  can  be  expressed  as  a 
substitution  of  one  primitive  by  another.  With  the  introduction  of  proba¬ 
bilities  of  substitution,  deletion  and  insertion  errors,  a  stochastic  model 
of  syntax  errors  can  be  formulated.  Using  this  model,  the  probability  of 
deforming  a  sentence  x  to  a  sentence  y,  q(y|x)  can  be  computed.  The 
maximum-likelihood  error-correcting  parsing  algorithmtt  is  to  search  for  a 
sentence  x,  x  e  L(Gs>  such  that 

q(y|x)  P(x)  =  max  <q(y|z)  p(z)  |  z  e  L(Gs)> 
z 

where  p(z)  is  the  probability  of  generating  z  by  the  stochastic  (pattern) 
grammar  Gs«  The  term  of  q(y|x)  p(x)  is  called  the  probability  that  a  sen¬ 
tence  y  is  an  error-deformed  sentence  of  L(Gg)  and  is  denoted  as  q(y|Gs>. 

By  adopting  the  method  of  constructing  covering  grammars  used  by  Aho 
and  Peterson  C343,  we  can  construct  a  stochastic  error-induced  grammar  from 
the  original  stochastic  context-free  (pattern)  grammar  to  accommodate  the 
stochastic  deformation  model.  A  modified  Earley  parser  for  the  stochastic 
error-induced  grammar  is  proposed  to  implement  the  search  of  the  most  likely 
error  correction  C51D.  A  more  general  deformation  model  (including  the  use 
of  attributed  grammars)  and  its  corresponding  Bayes  error-correcting  recog¬ 
nition  system  has  recently  been  reported  [52,66,67], 
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\_.T_  Matching  of  Stochastic  Graphs 

Relational  graphs  are  used  in  syntactic  pattern  recognition  to 
represent  the  structural  information  of  patterns  C10D.  The  nodes  in  a  rela¬ 
tional  graph  denote  subpatterns  and  pattern  primitives,  and  the  branch 
between  two  nodes  represents  the  relation  between  subpatterns  and/or  primi¬ 
tives.  Recently,  Tsai  and  Fu  C53]  have  proposed  to  extend  the  stochastic 
deformation  model  described  in  Section  1.6  to  error-correcting  graph  match¬ 
ing.  Attributed  relational  graphs  for  syntactic  pattern  recognition  are 
first  defined.  A  stochastic  deformation  modeL  for  attributed  relational 
graphs  is  then  formulated.  Only  the  case  where  the  deformation  does  not  af¬ 
fect  the  structure  of  the  underlying  unlabeled  graph  but  only  corrupts  the 
information  contained  in  the  primitive  and  relations  is  considered.  Such  a 
deformation  is  called  graph-preserved  deformation.  Pattern  deformation  pro¬ 
babilities  can  be  calculated  from  primitive  deformation  and  relation  defor¬ 
mation  probabilities.  An  ordered-search  algorithm  is  proposed  for  determin¬ 
ing  the  maxi  mum- likelihood  error-correcting  isomorphisms  of  attributed  rela¬ 
tional  graphs. 

1_.8_  Remarks 

The  decision-theoretic  followed  by  syntactic  approach  has  been  the  most 
popular  hybrid  approach.  The  approach  is  simple  to  apply.  However,  noise 
and  distortions  are  only  considered  at  the  local  or  primitive  level.  Seg¬ 
mentation  error  and  structure  distortion  are  not  taken  into  consideration. 
The  approach  of  using  stochastic  languages  can  certainly  take  care  of  noise 
and  distortion  at  both  primitive  and  structure  levels,  particularly,  when 
the  primitives  are  recognized  by  decision-theoretic  methods.  Practical  ap¬ 
plications  include  ECG  interpretation  and  fingerprint  classification 
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£54,553.  Unfortunately,  a  large  number  of  training  samples  is  often  re¬ 
quired  to  accurately  infer  the  production  probabilities.  Segmentation  and 
primitive  recognition  errors  are  explicitly  considered  in  error-correcting 


syntax  analysis.  Probabilities  for  different  errors  can  be  estimated  (or 
subjectively  assigned)  from  the  performance  evaluation  of  segmentation  and 
primitive  recognition  devices.  One  application  of  this  approach  is  the 
recognition  of  spoken  words  and  phrases  £562.  In  practice,  parsing  time  may 
need  to  be  sped  up  by  using  sequential  or  parallel  parsing  techniques 
£57,583.  Attributed  grammars  can  be  used  to  provide  both  syntactic  and  se¬ 
mantic  information  for  pattern  description  £10,49].  A  syntactic-statistical 
approach  to  pattern  recognition  based  on  attributed  grammars  has  recently 
been  proposed  £663.  Attributed  relational  graphs  are  regarded  as  a  more 
general  model  in  describing  two  and  three  dimensional  patterns.  It  is  anti¬ 
cipated  that  the  speed  of  error-correcting  graph  isomorphisms  is  rather 
slow.  The  use  of  parallel  processing  could  be  one  way  to  speed  up  the  pro¬ 
cedure.  The  practical  utility  of  this  approach  still  needs  to  be  tested. 

The  idea  of  using  hybrid  approaches  in  solving  practical  pattern  recog¬ 
nition  problems  is  not  new  £10,59-643.  In  practice,  only  the  decision- 
theoretic  followed  by  syntactic  approach  can  be  easily  applied.  There  is 
certainly  a  need  of  further  studies  on  other  possibilities  of  mixing  the 
decision-theoretic  and  the  syntactic  approaches. 
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Fig.  1.  A  general  pattern  recognition  system. 

Fig.  2.  Block  diagram  of  a  decision-theoretical  pattern  recognition  system. 
Fig.  3. 

Fig.  4.  Maximum-Likelihood  Syntactic  Recognition  System. 
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pattern  recognition  system. 
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CHAPTER  II 


AUTOMATIC  INSPECTION  BY  LOTS  IN  THE 
PRESENCE  OF  CLASSIFICATION  ERRORS 

Introduction 

Automatic  inspection  of  manufactured  products  is  a  very  important  ap¬ 
plication  area  of  pattern  recognition.  The  multifold  goals  of  automating 
this  particular  aspect  of  industrial  production  include  raising  the  standard 
of  quality  control  by  improving  the  reliability  of  the  existing  inspection 
channels,  speeding  up  the  inspection  process  to  keep  up  with  the  increasing 
production  outputs  due  to  mechanization  and  automation  of  industrial 
processes,  relieving  the  human  element  in  the  inspection  process  from  carry¬ 
ing  out  repetitive  and  boring  tasks  and,  last  but  not  least,  minimizing  the 
cost  of  quality  control.  To  date  a  number  of  promising  applications  of  pat¬ 
tern  recognition  techniques  to  various  automatic  inspection  problems  have 
been  reported.  In  particular,  methods  for  automati cal ly  inspecting  reed 
switches  have  been  described  by  Jarvis  C1J  and  Van  Daele  et  al  C2J.  Au¬ 
tomatic  systems  for  inspection  of  printed  circuit  boards  C3D  and  LSI  circuit 
masks  C4]  have  recently  been  developed.  Pattern  recognition  techniques  have 
been  applied  to  the  problem  of  inspecting  pharmaceutical  products,  C53  mov¬ 
ing  metal  surfaces,  C63  gas  meters  C7J  etc. 

Ideally,  in  quality  control  one  would  like  to  aim  at  inspecting  every 
single  item  manufactured.  In  practice,  however,  100%  inspection  is  not  al¬ 
ways  economically  feasible  even  assuming  an  advanced  stage  of  automation  and 
it  then  becomes  necessary  to  control  the  quality  of  a  small  sample  of  these 
items  in  each  lot.  On  the  basis  of  the  number  of  defective  items  in  the 
sample  set  a  decision  regarding  acceptance  or  rejection  of  the  whole  lot  is 


then  made. 
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The  philosophy  of  the  quality  control  by  lots  is  to  reduce  inspection 
costs  by  finding  the  minimum  sample  size  required  to  ensure  that  each  lot  of 
products  meets  the  quality  standards  specified  by  the  consumer  while  keeping 
at  a  low  level  the  manufacturer's  risk  of  having  to  inspect  all  the  items  in 
any  lot  of  acceptable  quality  or  even  of  having  to  discard  these  items.  The 
design  of  a  two-sided  acceptance  sampling  plan  for  this  purpose  is  based  on 
a  faily  of  operating  characteristics  which  define  the  probability  of  accept¬ 
ing  a  lot  of  a  given  size  as  a  function  of  the  rate  of  defective  items  in 
the  lot.  The  actual  rate  of  defective  items  in  the  sample  taken  from  the 
lot  serves  as  the  parameter  of  the  family  of  these  functions. 

Quality  control  by  lots  is  a  long  established  approach  to  industrial 
inspection  with  a  well  developed  methodology  [8,12].  Unfortunately,  the  ex¬ 
isting  acceptance  sampling  plan  design  techniques  are  applicable  only  under 
the  assumption  that  the  cLassi fi cation  of  individual  items  in  the  sample  set 
into  the  categories  of  defective  and  non-defective  products  is  error  free. 
When  pattern  recognition  systems  are  employed  to  inspect  individual  items 
this  assumption  is  not  necessarily  satisfied.  The  presence  of  classifica¬ 
tion  errors  affects  the  probability  distribution  of  defective  items  in  the 
sample  set  which  has  to  be  taken  into  account  when  designing  an  acceptance 
sampling  scheme. 

The  effect  of  classif ication  errors  on  acceptance  sampling  plans  has 
recently  been  studied  by  Kittler  and  Pau  C9D,  who  considered  the  case  where 
the  a  priori  probabilities  of  the  categories  of  defective  and  non-defective 
items  in  the  lot  differ  from  those  of  the  mother  population.  They  derived  a 
system  of  operating  characteristics  which  are  an  essential  prerequisite  for 
the  design  of  a  suitable  acceptance  sampling  plan.  In  contrast  to  the  con¬ 
ventional  approach,  the  operating  characteristics  in  their  method  are 
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parameterized  in  terms  of  the  rate  of  items  classified  as  defective  rather 
than  the  actual  rate  of  defective  products,  which  is  unknown.  In  this 
chapter  it  is  assumed  that  the  mixture  probability  distribution  of  items  in 
a  given  lot  has  an  arbitrary  form  which  cannot  be  functionally  related  to 
the  mother  mixture  population.  Such  a  situation  can  arise  in  an  environment 
with  rapidly  changing  conditions  in  the  manufacturing  process  or  of  the  raw 
material  used.  It  will  be  shown  in  Section  2.3  that  in  this  case  the  proba¬ 
bility  distribution  of  classification  errors  cannot  be  predetermined.  Con¬ 
sequently,  it  is  not  possible  to  obtain  the  operating  characteristic  which 
is  required  for  the  design  of  a  conventional  two-sided  acceptance  sampling 
plan  (plan  satisfying  both  the  consumer's  and  manufacturer's  specifica¬ 
tions).  Instead,  in  Section  2.4,  a  new  quality  control  procedure  is  pro¬ 
posed  which  guarantees  the  product  quality  levels  specified  by  the  consumer. 
First,  however,  in  Section  2.2,  the  model  considered  and  the  essential 
mathematical  preliminaries  will  be  introduced. 

2^j2  Preliminaries 

Let  x  =  Cx^...Xp]T  be  a  p-dimensional  pattern  feature  vector  represent¬ 
ing  an  item  to  be  inspected,  with  T  denoting  the  transpose.  We  denote  the 
classes  of  non-defective  and  defective  products  by  and  w.,  respectively. 
Further  let  the  pattern  representation  space  be  partitioned  into  non¬ 
overlapping  regions  S^  and  $2  associated  with  classes  and  t^.  Then  any 
pattern  vector  x  in  the  region  S^  will  be  considered  as  belonging  to  class 
i.e.  the  decision  rule  determining  whether  x  represents  a  good  or  defec¬ 


tive  item  can  be  stated 
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assign  x  to  u.  if  x  e  s.. . 


(1) 


The  regions  could  be  determined  using  standard  pattern  classifica¬ 
tion  learning  algorithms  Cl 02  on  the  basis  of  the  information  conveyed  by  a 
training  data  set  with  labelled  samples.  Alternatively  (and  this  is  often 
the  case  in  quality  control)  regions  S.  are  defined  by  prespecifying  toler¬ 
ances  on  the  product  characteristics  as  embodied  by  feature  measurements  x^, 
j  —  1 , 2, • . . ,p . 

In  the  following  we  assume  that  the  a  posteriori  class  probability 
functions  P(“^|x)  are  known  at  every  x.  This  assumption  implies  that  either 
the  physical  processes  involved  in  generating  patterns  from  cLasses  u  and 
can  be  modelled  with  a  sufficient  accuracy  or  the  training  data  set  is 

large  enough  to  allow  functions  P(w..|x)  to  be  estimated  with  negligible  bias 

and  variance. 

As  pointed  out  in  the  introductory  section,  in  the  model  considered  in 

this  paper  it  is  assumed  that  the  items  in  a  lot  are  drawn  from  a  mixture 

probability  distribution  characterized  by  a  density  function  (Hx).  Then  the 
classification  error  of  type  I  giving  the  rate  of  samples  from  class  be¬ 
ing  assigned  to  class  b'  decision  rule  (1)  is  defined  as 


x)p(x)dx . 


(2) 


Similarly,  the  classification  error  of  type  II  is  given  by 


e2 


P 1  x)p(x)dx. 


(3) 


The  rate  of  samples  classified  to  uk  ,  i$  given 
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2-  =  /  p(x)dx,  (4) 

S1 

while  the  true  rate  of  defective  and  non-defective  items  P  and  P  can  be 

n 

written  respectively  as 

P  =  /  P(“2|x)p(x)dx  (5) 

and 

Pn  =  Ix)0<x)dx.  (6) 

Note  that  the  rate  of  items  classified  as  defective  by  our  decision 
making  system  can  be  expressed  as 

c?  a  /  P(W  |x)p(x)dx  +  f  P(u- | x)p( x)dx .  (7) 

S2  S2 

Utilizing  equations  (2),  (3)  and  (5)  we  get 

c2  =  +  P  -  e2>  (8) 

As  in  the  case  of  the  model  discussed  in  [93,  the  only  observable  quan¬ 
tity  in  equation  (8)  is  c^.  However,  in  contrast  to  that  model,  here  the 
probability  distributions  of  errors  e^  and  e^  cannot  be  approximated  by  ap¬ 
propriate  binomial  distrioutions.  The  reason  for  this  is  that  the  expected 
value  of  ei  which  could  be  used  as  the  parameter  of  an  approximating  binomi¬ 
al  distribution  is  not  known.  Consequently  the  approach  to  designing  an  ac¬ 
ceptance  sampling  plan  for  quality  control  proposed  in  [93  cannot  be  adopt¬ 
ed.  In  the  following  section  the  probability  distribution  of  the  rate  of 
defective  items  in  the  sample  set  taken  from  a  lot  will  be  derived.  This 
distribution  will  then  be  used  as  a  basis  of  a  new  quality  control  scheme 
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proposed  in  Section  2.4. 

2.1  The  Probabi lity  Distribution  of  the  Rate  of  Defective  Items 

It  was  shown  in  [93  that  in  order  to  check  the  quality  of  a  given  lot 
of  products  in  the  case  where  the  probability  structure  underlying  the  dis¬ 
tribution  of  items  in  a  lot  differs  from  that  of  the  training  data  only  in 
the  a  priori  probabilities  of  classes  ^  and  w  it  was  sufficient  to  ob¬ 
serve  the  realization  of  variable  and  compare  it  with  the  predetermined 
threshold.  Moreover,  detailed  knowledge  of  the  a  posteriori  probabilities 
of  classes  ^  and  ^  f°r  each  element  x  of  the  test  set  was  not  required. 
In  the  case  of  the  model  considered  here  the  situation  is  somewhat  dif¬ 
ferent.  We  can  also  observe  by  simply  examining  the  position  of  pattern 
x  in  the  test  set  with  respect  to  region  S^.  However,  since  (5(x)  is  assumed 
to  be  of  a  non-parametric  form  and  in  general,  changing  from  one  Lot  to 
another,  the  probability  distribution  of  classification  errors  must  be 
determined  for  each  test  set  separately.  This  implies  that  the  operating 
characteristi cs  cannot  be  precomputed  and  an  alternative  strategy  must  be 
adopted.  Moreover,  in  order  to  determine  the  probability  distribution  of 
errors,  it  must  be  possible  to  observe  the  class  a  posteriori  probabilities 
for  every  x.  It  is  apparent  that  from  the  computational  point  of  view  any 
quality  control  procedure  for  the  present  model  will  be  considerably  more 
involved. 

Let  us  denote  the  actual  rate  of  misclassif ied  patterns  from  class 
i  =  1/2,  by  (realization  of  e^).  It  has  been  shown  elsewhere 

C113  that  the  distribution  function  of  is  given  as 
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g(t  .  =  k|  6.)  =  (1/k)  £,  {(-D^Vt  .  =  k  -  j  | c . ) 

j=1  3-1  1 


with 

n-nc  - 

i 

g(T  =  Ole.)  =  n  ci  -  P(u>.|xt)]. 

t=1 

It  cannot  be  over  emphasized  that  the  probability  distribution  in  (9) 
is  correct  only  under  the  assumption  that  P(o>_.  |  x>  is  known  exactly.  In 
practice  this  assumption  will  not  be  satisfied.  However,  here  we  assume 
that  the  cardinality  of  the  training  data  set  is  large  enough  so  that 
P(u. |x)  can  be  estimated  with  a  sufficient  precision.  The  alternative  would 
be  to  take  the  probability  distribution  of  estimates  of  P(uk  |x)  into  account 
in  the  analysis.  However,  from  the  point  of  view  of  computational  complexi¬ 
ty,  this  solution  would  make  the  procedure  proposed  impracticable. 

The  probability  distribution  of  P’  in  equation  (8)  is,  of  course,  given 
by  the  hypergeometri c  distribution.  Since  we  know  the  distributions  of  all 
the  quantities  appearing  in  equation  <8)  we  can  now  evaluate  the  probability 
that  given  the  number  of  defective  in  the  lot,  d,  £^  will  take  on  a  particu¬ 
lar  value  $2*  This  probability  is  given  as  the  sum  of  the  probabilities  of 
occurrence  of  each  triplet  P  ,  and  yielding  £2,  i.e. 
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Pr CP  =  P  |d)Pr(r1 


T*|e*)Pr(t2  =  T*|1  -  e*) 


X  6(62  “  P*  “  T*  +  T*)  (10) 

.  -4r 

P  =  (k/n)  k  =  1 ,2, . .  .,min[n,d!l 

t*  =  (k/c^n)  k  =  0,1 ,2, . . ,,£2n 

t*  =  (k/c*n)  k  =  0,1,2,. ..,c*n, 

where  6(*)  is  the  Kronecker  delta  function. 

We  have  thus  obtained  an  expression  for  calculating  the  probability 
that  given  d  the  classification  system  will  assign  exactly  £2n  =  c^n  items 
into  the  class  of  defective.  Note  however,  that  function  Pr(?2  =  c*|d)  of 
argument  d  is  not  a  probability  distribution  function.  Further,  it  would  be 
more  convenient  to  be  able  to  say  what  is  the  probability  that  the  lot  con¬ 
tains  exactly  d*  bad  products  given  =  6^  rather  than  work  with  the  proba¬ 
bility  of  observing  52  under  the  various  hypotheses.  Invoking  the  Bayes 
formula  for  calculating  conditional  probabilities  we  get 

.  Pr(6?  =  £*|d)Pr(d  =  d*) 

Pr(d  =  d*|e2>  =  - - - - -  (11) 

Pr(52  =  £*) 


j 


We  shall  assume  that  a  priori  probability  of  occurrence  of  any  value  of  d  is 
equally  likely.  Since  there  are  N  +  1  possible  values  d  can  assume  (d  = 


-  9  - 


0,1,2, ...,N)  then 

Pr(d  =  d*)  =  N~4  (12) 

The  unconditional  probability  Pr(c2  =  6^) 

Pr(c2  =  c*)  =  jj-4-fI>r(e2  =  e2d).  (13) 

Thus  we  can  write 


Pr (d  =  d*| 2*) 


CPr(S2  =  c2|d)] 


£  Pr<e,  =  eild) 
d=0 


(14) 


Using  expression  (14)  we  can  determine  the  conditional  probability  distribu¬ 
tion  of  random  variable  d  given  £*  and,  naturally,  the  cumulative  distribu¬ 
tion 


p(d)  =  ^2  Pr(d  =  d*|c,  =  £*). 
d*=0  c  <L 


(15) 


Acceptance  Sampling  Strategy 

The  curve  u(d)  defines  at  every  point  d  the  probability  that,  assuming 
?2  has  been  observed,  there  are  at  most  d  defective  items  in  the  lot.  On 


the  other  had  the  curve  1  -  u(d)  defines  for  every  d  the  probability  that 
the  lot  contains  more  than  d  defective  products.  Based  on  these  observa¬ 
tions  we  can  now  propose  a  quality  control  test  as  follows. 
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Hypothesis 

Null  hypothesis  Hq  (accept  the  Lot) 

number  of  defective  d  _<  dT 
Alternative  hypothesis  (reject  the  lot) 

number  of  defective  d  >  dT, 

where  d^.  is  a  given  threshold. 

Accept  Hq  if  1  -  u(dT)  <  B,  otherwise  reject  Hq.  (16) 

We  shall  now  summarize  the  proposed  quality  control  scheme. 

1.  Classify  elements  of  the  test  set  of  size  n  taken  from  a  lot  to  obtain 


2.  Determine  the  probability  distributions  of  errors  e^  and  e^  using  equa¬ 
tion  (9). 

3.  Evaluate  Prlc^  =  G^ld)  for  all  d  according  to  equation  (10). 

4.  Determine  the  cumulative  distribution  1  -  u(d)  in  equation  (15). 

5.  Apply  hypothesis  test  (16). 

A  few  comments  are  in  order  here.  First  of  all,  there  is  a  difference 
between  the  quality  control  concepts  employed  here  and  in  C93.  In  the  case 
of  the  model  discussed  in  C9J  the  manufacturer  guarantees  that  the  probabil¬ 
ity  of  a  lot  with  d.j.  or  more  defective  items  passing  through  the  quality 
control  does  not  exceed  8,  i.e.  Pt(Hq  accepted|d  >  dT)  <  B.  On  the  other 
hand,  in  the  present  model  the  manufacturer  ensures  that  the  probability  of 
accepted  lots  containing  dy  or  more  defective  items  is  less  than  B,  i.e. 
Pr(d  dT | Hq  accepted)  £  B.  Note,  however,  that  for  any  model  we  have 


(18) 


Pr  (Hq  accepted|d)  d^ 


PrCHg  accepted) 

Pr(d  >  dT|H0  accepted)  -p7(7^ -  - 


Let  us  consider  the  relationship  (18)  in  more  detail.  According  to  equation 
(12)  we  have 

N  -  d  +  1 

"<d2V  Vf— 

Further,  under  the  assumption  that  the  a  priori  probability  of  accept¬ 
ing  Hq  equals  the  a  priori  probability  of  the  lot  containing  d  <  d^.  defec¬ 
tive  items,  i.e. 


Pr  (Hq  accepted)  = 
Equation  (18)  implies 


Pr  (Hq  acceptedjd  dT) 


<  6 


+  1 


<  B, 


(19) 


provided  d^  <  N  -  d^  +  1 .  (20) 

Thus  the  quality  control  scheme  developed  automatically  satisfies  the 
consumer's  risk  specifications.  It  cannot  be  overemphasized  however  that 
the  parallel  between  these  two  models  can  be  drawn  only  under  the  assumption 
of  the  validity  of  equation  (19)  and  of  the  particular  model  for  the  distri¬ 
bution  Pr(d  =  d*) . 

The  main  shortcoming  of  the  proposed  sampling  scheme  is  the  lack  of  any 
guidelines  for  choosing  the  size  of  the  test  set,  n.  In  principle,  the  ac¬ 
ceptance  sampling  plan  should  be  applied  for  several  (monotonically  increas¬ 
ing)  values  of  n  with  the  quality  check  terminating  when  1  -  y(dT)  remains 
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constant . 

The  computational  burden  of  such  a  control  scheme  could  be  eased  by  ap¬ 
proximating  distribution  (9)  to  the  binomial  distribution  with  parameter  e^ 


e . 


n-nc  ■ 

£ 

t=i 


ci 


P(wi |xt)3 


(21) 


If  the  accuracy  afforded  by  this  approximation  were  deemed  to  be  satisfacto¬ 
ry  it  would  be  possible  to  precompute  a  set  of  parametric  acceptance  sam¬ 
pling  plans  as  in  (9)  in  the  form  of  a  look  up  table,  giving  appropriate 
values  of  6^  f°r  the  who!-e  spectrum  of  combinations  of  e^,  i  =  1,2.  Since 
e\  depends  on  n,  the  acceptance  threshold  would  have  to  be  determined  for 
the  minimum  value  of  n  as  a  function  of  e^  and  e^  satisfying  the  given  qual¬ 
ity  control  specification. 


2^.5_  Conclusions 

A  pattern  recognition  system  for  the  inspection  of  products  by  lots  has 
been  studied.  It  has  been  shown  that  in  the  presence  of  classification  er¬ 
rors  the  existing  acceptance  sampling  plans  cannot  be  used.  An  alternative 
quality  control  procedure  has  been  developed  for  the  model  assuming  an  arbi¬ 
trary  distribution  of  patterns  in  the  lot.  Computationally  the  scheme  is 
very  demanding.  Some  simplification  can  be  achieved  by  approximating  the 


actual  probability  distributions  of  classification  errors  with  the  binomial 
distributions  of  having  identical  expected  values. 
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CHAPTER  III 


VISUAL  SCREENING  OF  INTEGRATED  CIRCUITS  FOR  METALLIZATION  FAULTS 
BY  PATTERN  ANALYSIS  METHODS 

3^1_  Introduction 

As  the  complexity  of  integrated  circuits  (IC's)  increases,  the  testing 
problem  becomes  more  and  more  accute  in  terms  of  final  production  yield  and 
IC  costs  Cl]. 

-  metallization  defects  (open  or  short  circuits,  scratches,  migration, 
corrosion) 

-  wire  and  die  bonds  (open,  shorted,  fatigued) 

-  process  faults,  esp.  oxyde  pinholes  and  diffusions 

-  surface  defects  and  loose  particles 

-  die  cracks,  dirty  photomasks 

-  external  leads 

-  dielectric  failures 

-  packaging  defects  and  seals 

-  thermal  mismatch 

-  violation  of  design  rules 

The  usual  testing  procedure  includes  a  suitable  combination  of  the  following 
basic  testing  processes  Cl]. 

-  pre-cap  and  external  visual  inspection,  and  X-ray  inspection 

-  electrical  testing  (pre-cap  and  after  packaging) 

-  environmental  testing,  especially  temperature  cycling  or  shocks. 

One  of  the  more  fundamental  constraints  about  current  IC  testing  pro¬ 
cedures  is  the  fact  that  those  listed  above  are  implemented  in  sequence,  at 
many  different  stages  of  the  manufacturing  process  C4],  This  limitation  be¬ 
comes  even  more  severe  if  lot  inspection  procedures  are  totally  discarded  at 
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some  of  these  stages,  in  order  to  achieve  100%  testing  throughout  the 
manufacturing  process. 

(A)  Pre-cap  visual  inspection:  The  thorough  visual  inspection  before  the 
chip  is  encapsulated  is  designed  to  eliminate  unreliable  circuits  from 
further  processing.  The  visual  inspection  is  generally  carried  out  by  human 
operators  with  the  aid  of  high  power  and  low  power  microscopes;  it  includes 
essentially  the  leads,  die  and  wire  bonding,  and  the  topology  of  the  chip. 
Rigorous  adherence  to  100%  pre-cap  visual  inspection  prior  to  encapsulation 
is  also  essential  to  weed  out  potentially  unreliable  circuits  that  would 
otherwise  pass  all  other  screening  tests.  While  the  effectiveness  of  pre¬ 
cap  visual  testing  is  high,  the  cost  and  time  related  to  human  operators  is 
too  high. 

(B)  Electrical  testing:  Test  patterns  are  usually  reserved  by  the  IC  layout 
to  provide  for  convenient  areas  for  contact  probing  (in  addition  to  per¬ 
manent  leads).  These  areas  are  included  within  the  actual  circuit  die  area 
to  assure  the  necessary  matching  of  characteristics.  In  addition  to  provid¬ 
ing  locations  for  probing  with  minimum  damage  to  the  actual  circuits,  and 
minimum  electrical  interference,  the  test  pattern  may  be  designed  to  provide 
for  components  which  amplify  the  signals  to  be  measured  C5,6]  (Fig.  1).  A 
prerequisite  to  the  use  of  the  test  patterns  through  probing  is  the  proper 
alignment  of  the  IC.  Electrical  testing  does  not  in  general  lead  to  fault 
location  on  the  IC,  and  undetectable  failure  modes  may  exist  especially  when 
only  limited  testing  is  applied. 


\ 
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(C)  Thermal  cycling  and  shocks:  These  tests  will  weed  out  many  future  faults 
or  defects  not  apparent  to  visual  inspection,  in  addition  to  helping  local¬ 
ize  them.  The  are  e.g.  recommended  for  crystal  imperfections,  cracked  dies, 
oxyde  pinholes,  oxyde  shorts,  passivation  defects,  opening  of  thermal  seals, 
poor  wire  bonds.  Thermal  testing  is  usually  unattended,  but  lengthy  and 
costly. 

(D)  Integrated  pre-cap  testing:  In  order  to  speed  up  and  automate  the  pre¬ 
cap  testing  stage,  this  chapter  proposes  the  concept  of  integrated  pre-cap 
testing  and  pattern  analysis  methods  to  implement  it.  These  methods  are 
developed  to  allow  for  a  direct  interaction  with  IC  design  tools.  The  goal 
is  defect  detection  and  possible  localization  by  automated  IC  image  analysis 
at  different  wavelengths,  while  electrical  testing  and  some  thermal  tests 
are  carried  out,  without  any  test  bench  transfer.  The  methods  proposed  are 
not  restricted  to  periodic  structures  (although  they  would  be  simplified  by 
such  assumptions),  nor  to  only  vertical/horizontal  etchings.  They  allow  for 
possibly  large  geometric  or  topological  deformations,  and  defect  scattering, 
and  are  not  based  on  deformations  of  reference  layouts  as  usually  the  case 
in  the  literature.  Also,  the  topological  layout  is  explicitly  allowed  to  be 
context  sensitive  as  in  reality,  as  opposed  to  context  free  assumptions. 

3^. Automatic  Visual  Inspection  of  Masks  and  IC 's 

This  section  briefly  surveys  current  and  past  research  in  this  field. 

(A)  Visual  and  electrical  testing  both  require  test  patterns  on  the  IC; 
original  probe-pad  test  patterns  have  been  designed  which  contain  visu¬ 
al  alignment  indicators  and  probe  resistors  [5,63.  Coarse  prior  mask 


and  IC  alignment  may  be  necessary  before  fine  alignment  via  the  test 
patterns;  line-by-line  scanning,  or  vertical  and  horizontal  boundary 
detection  then  takes  place  for  the  estimation  of  the  bias  and  the  tilt 
angle  of  the  mask/IC  18,9,10,233.  Coarse  IC  alignment  is  often  mechan¬ 
ical. 

(B)  Once  a  proper  fine  alignment  is  completed,  pre-cap  visual  inspection 
may  get  started.  All  existing  automated  methods  can  be  divided  into 
the  following  three  types  [113: 

a.  defect  enhancement  by  image  filtering 

b.  image  matching 

c.  pattern  matching 

In  this  respect,  it  is  necessary  to  point  out  the  fact  that  most 
methods  and  systems  were  actually  designed  as  extensions  of  printed 
circuit  board/drawing  (PCB)  inspection  systems,  or  are  at  least  res¬ 
tricted  to  PCB  inspection  1112,13,14,15,163.  Consequently,  the  sensors 
used  are  exclusively  TV  or  CCD  line-by-line  scanners  C133,  and  many 
problems  specific  to  IC's/masks  have  been  neglected.  At  the  same  time, 
no  advanced  pattern  recognition  methods  have  been  considered  or 
developed  for  this  IC  application  (inspection). 

(C)  Defect  enhancement  by  image  filtering  is  done  by  operating  on  the  one¬ 
dimensional  Fourier  transform  or  Vander  Lught  filter  of  each  scan,  and 
by  detecting  peaks  in  the  transform  Cl 7, 183.  This  approach  is  res¬ 
tricted  to  strictly  periodic  structures,  and  remains  sensitive  to  posi¬ 
tioning  errors  and  tolerances. 

(D)  Image  matching  consists  of  comparing  adjacent  similar  chip  patterns  on 
the  same  die  or  mask,  with  additional  comparison  to  reference  patterns 
Cl 1 ,19, 20,243 .  Defects  are  recorded  as  differences  by  image  subtrac- 


tion  or  by  correlation  121,223.  Especially  comparison  on  adjacent  dies 
of  50  y  x  50  u  windows  has  been  investigated  and  is  considered  a  stan¬ 


dard  procedure.  The  matching  is  however  subject  to  small  edge 
misalignment,  edge  coloring,  misregistration,  human  errors,  and  image 
enhancement  remains  necessary. 

(E)  Pattern  matching  is  related  to  the  production  of  small  (5x5  or  10x10 
pixels)  reticles  (straight  segments,  corners,  spots).  It  has  been 
shown,  in  the  case  of  PCB,  that  only  a  moderate,  e.g.  500,  number  of 
all  possible  mask  patterns  are  needed  to  describe  all  areas  in  the  true 
layout  Cl 23  for  one  layer  at  a  time.  These  features,  usually  binary, 
are  then  correlated  with  a  reference  for  failure  detection. 

(F)  In  general,  however,  these  three  approaches  are  organized  into  a 
hierarchical  inspection  process,  with  various  levels  according  to  the 
resolution,  area  and  field  of  view  C273.  Microfaults  revealed  through 
visual  or  other  means  (X-rays),  depending  on  the  nature  of  the  sub¬ 
strate,  are  important.  It  has  been  shown  that  such  faults  tend  to 
cluster  in  groups  with  varying  spacings,  and  that  fault  density  clus¬ 
tering  may  be  considered  for  yield  predictions  14,293. 


_3.3_  Integrated  Pre-Cap  Testing 

This  testing  procedure  has  been  defined  in  Section  3.1(D).  At  the 
design  stage,  it  uses  chip/mask  layout  by  computer-aided  design  (CAD)  with 
organization  into  cells  [30,31,323.  The  screening  will  thus  be  reduced  to 
sequences  of  such  cells,  using  the  topological  layout  features  *n  the  CAD, 
rather  than  the  geometrical  features  only.  The  test  points  are  assumed  to 
be  generated  and  selected  by  CAD  within  each  cell  C33,343. 
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Whereas  some  experiments  have  been  carried  out  on  the  comparison  of  ad¬ 
jacent  dies  at  3  different  colours  £20],  we  here  consider  a  procedure  using 
2  different  wavelengths  in  the  visible  domain  and  1  or  2  wavelengths  in  the 
middle  infrared  CIR )  domain  (3  and  10  u).  In  other  words,  IC  pre-cap  in¬ 
spection  with  infrared  thermography  is  considered,  thus  allowing  for 
simultaneous  visual,  electri cal  and  eventua l ly  therma  l  testing 
£35,36,37,38].  The  near-IR  inspection  will  lead  to  the  detection  of  hot 
spots  under  various  testing  patterns.  The  middle-to-long  IR  inspection  will 
localize  many  metallization  and  bonding  defects,  again  under  various  electr¬ 
ical  testing  patterns.  The  choice  of  the  wavelengths,  resolution  and  window 
sizes  will  clearly  depend  on  lithographic  resolution,  circuit  density,  and 
not  least  substrate  properties.  Although  IR  eirnssivity  is  affected  by  sur¬ 
face  condition  and  substrate,  excellent  temperature  resolutions  can  be  ob¬ 
tained  both  on  silicium  and  gallium  arsenide  £36],  thus  assisting  failure 
localization  when  emission  anomalies  are  observed  while  electrical  testing 
is  carried  on.  Another  advantage  of  this  procedure  is  to  restrict,  by  ernis- 
sivity  considerations,  the  size  of  the  IC  cell  portion  activated  through 
electrical  stimuli  selection  £39,51],  thus  leading  to  less  complex  image 
patterns  to  be  analyzed  than  in  the  visible  domain.  We  will  call  sub-cell 
any  such  IC  cell  portion  activated  through  electrical  stimuli. 

In  the  remainder  of  this  chapter,  we  restrict  ourselves  to  the  visual 
inspection  methods  related  to  subcells,  within  the  framework  of  the  above 
procedure.  The  subcell  ’mages  at  the  various  wavelengths  are  assumed  to  be 
acquired  through  the  optica-l  field  of  view  of  a  multi-lens  microscope.  The 
subcell  images  are  also  assumed  to  be  thresholded  and  digitized  on  a  few 
gray  levels. 
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3_. 4_  Algorithm  #1_:  Matching  Bridges  in  the  Topological  Subcell  Layouts 

(A)  Principle;  The  idea  of  Algorithm  #1  is  not  to  match  IC  cell  patterns, 
but  to  match  critical  topological  elements  called  bridges  140,413.  The 
bridges  are  those  sections  of  the  IC  subcell  surface  which  are  nonredundant 
for  proper  electrical  ci rcuit/subcel l  operations;  in  case  of  defects,  short 
circuits  and  conductoi — substrate  interface  anomalies  also  become  parasite 
bridges  (Fig.  2).  The  bridges  are  determined  by  Algorithm  operating  on 
the  graph  representation  G  of  the  subcell  as  obtained  from  the  IC  image.  A 
simple  thinning  procedure  'ring  local  neighborhood  relations  is  used  to  get 
the  graph  representation  G  of  the  subcell,  as  seen  under  current  optical  and 
electrical  conditions.  It  should  be  noticed  that  this  thinning  procedure  is 
far  easier  to  implement  than  any  parsing  of  the  IC  etching  boundaries  C423. 
Smaller  defects  eliminated  because  of  the  thinning  will  be  picked  up  by  Al¬ 
gorithm  #2. 

(B)  Graph  representation  of  the  IC  subcell  (Fig.  2):  The  n-node  graph 
G=(X,U)  labeled  with  a  path  algebra  P  can  be  described  by  its  adjacency  ma¬ 
trix,  which  is  the  n  x  n  matrix  A  =  (a.^)  with  entries: 

1(x.,x.),  label  of  (x,.,Xj)  if  (x^,x^)  e  U 

aij=  ♦  if  (x.,x.)  ^  U 
i  J 

where  $  is  the  zero  element  of  P,  and  U  the  order  relation  between  nodes 
(Fig.  2,  Table  1).  The  k-th  power  A  of  A  can  be  defined  in  terms  of  the 
labels  of  paths  on  the  graph  corresponding  to  A,  in  the  following  way.  Let 

If 

S..  be  the  set  of  all  paths  of  order  k  from  node  x.  to  node  x.  on  the  la- 
i]  1  i 

beled  graph  G  of  A;  then: 
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ak^  =  V  Ll(s>;  seSkj>  where 


V:  join  operation  in  P  (Table  1) 
*:  product  in  P  (Table  1) 


k  k 

Each  element  a.^  of  A  is  the  set  of  names  of  all  simple  paths  of  order  k 

from  node  x.  to  node  x.. 

i  J 

We  shall  denote  the  strong  and  weak  closure  of  a  stable  matrix  A  by 

•fg  -fa  m 

A  =  (a . j )  and  A  =  (a.^),  respectively.  A  is  such  that: 

V  k=0,1,...  A*k=A*(k+1) 

A*=E  V  A  A*=A<n_1) 


E  =  Identity  matrix  for  .  in  P  (Table  1) 

*  *  n  k 

A  =  AA  =  V  A 

k=1 

if  it 

Each  element  a^  of  A  is  the  set  of  names  of  all  the  simple  paths  from  x^ 
to  Xj.  Each  element  a^  of  A  is  the  set  of  names  of  aLL  non-null  simple 
paths  from  to  x^.  If  only  binary  labels  are  considered  (binary  pic¬ 
tures),  A  is  the  boolean  adjacency  matrix  of  the  graph  G;  A*  has  then  en¬ 
tries  a*^  =  1  if  there  exist  any  paths  from  x^  to  x^,  and  a^  =  0  otherwise; 
whereas  A  has  a^  =  1  if  there  exist  any  non-null  paths  from  x^  to  x^,  and 
a.j  =  0  otherwise  (see  Table  1). 


(C)  Bridges  of  the  subcel  l  graph  G^:  Let  G=(X,E)  be  the  simple  graph 

representing  the  subcell  whose  edges  have  distinct  labels;  let  H=(X,U)  be 

the  graph  with  the  same  nodes  as  G,  and  which  has  two  arcs  (x^,x^)  and 

(x.,x.)  between  each  pair  of  nodes  x.,x.  which  are  joined  together  by  an 
j'  i  i  j 

edge  on  G;  on  H,  the  arcs  (x.,x.)  and  (x^,x ..)  both  bear  the  name  of  the 
corresponding  edge  (x.,x^)  on  G. 
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An  edge  (x..,Xj)  of  the  simple  graph  G  is  called  a  bridge  of  G,  if  in 
the  graph  obtained  from  G  by  removing  this  edge,  the  nodes  x^  and  x^  are  not 
connected;  in  Fig.  3,  f  is  a  bridge. 


(D)  Determination  of  the  bridges  of  G:  Considering  H,  each  entry  of  the  clo¬ 
sure  a*^  of  its  adjacency  matrix  A,  is  the  set  of  names  of  all  the  bridges 

between  x..  and  x..  Thus,  to  find  these  bridges,  we  need  to  be  able  to  com- 
*  ★ 

pute  A  or  A  directly.  One  such  method  is  the  Jordan  elimination  method 
which  can  be  applied  to  compute  the  weak  closure  A.  A  is  the  least  solution 
of  the  equation  Y=AY  V  8,  if  we  set  A^  =  8^  =  A,  because  then  A  = 

The  steps  of  the  algorithm  are  the  following: 


8<k)  =  Q 


A  =  B 


(n) 


<k)*  Ck-1) 

D 

k=1,2,... 


/n 


A(0)=B<0>=A 


Q 


(k)*  A 


(k-1) 

12 


(k-1)* 

22 


B 


(k-1)* 

22 


R(k-1)  (k-1)* 
B32  B22 


$ 

E 


where: 

-  the  B^  blocks  are  made  of  elements  b.^,  as  specified  below; 

-  the  closure  of  an  element  is  defined  in  a  similar  way  to  the  closure  of  a 


matrix; 


-  10  - 


c1 ^ 
xp  - « 

-r  >-r  v-r  ’ « -- 


Example:  (Fig.  3)  Here  $  is  the  empty  label,  and  4>  the  zero  element  in  G 


“$  a  b  c  $  $  $" 
a  $  d  $  $  $  $ 

b  d  $  e  $  $  $ 
c  $  e  $  f  $  $ 
$  $  $  f  $  g  h 
$  $  $  $  9  $  i 
.$  $  *  $  h  i  $_ 


■$ 

$ 

$ 

* 

f 

f 

f 

$ 

$ 

$ 

$ 

f 

f 

f 

$ 

* 

$ 

$ 

f 

f 

f 

$ 

$ 

$ 

* 

f 

f 

f 

f 

f 

f 

f 

$ 

$ 

$ 

f 

f 

f 

f 

$ 

$ 

i 

f 

f 

f 

f 

$ 

$ 

$ 

Other  algorithms  are  given  in  C41 ,43,44,453. 


(E)  IC  testing:  For  each  wavelength,  each  set  of  electrical  stimuli,  and 
each  subcell  with  resulting  graph  G,  the  bridges  of  G  are  matched  against 
those  of  the  reference  topological  layout  (CAD). 


3._5  Algorithm  UZi  Computation  of  _a  figure  of  Merit  for  the 
IC/Mask  from  £  Fuzzy  Language  Description 


(A)  Principle: 

1)  Algorithm  #2  is  designed  to  compute  for  each  subcell  a  figure  of  merit 
u  which  accounts  both  for  topological  and  geometrical  faults  in  the 
IC/mask,  while  still  accounting  for  litographic  resolution  and  image  ac¬ 
quisition  errors.  It  is  better  suited  than  Algorithm  #1  for  the  detection 
of  irregular  growth/etching  boundaries,  scratches,  blobs,  open  circuits. 
Acceptance  or  reject  of  the  IC/mask  is  on  the  basis  of  the  figures  of  mer- 


it  of  all  the  subcells. 


2)  Algorithm  #2  relies  on  the  following  elements: 

(a)  The  topological  model  of  the  fault-free  subcell  as  a  monoid  V*  of  a 
context-sensitive  language  Lq;  V*  is  generated  by  the  CAD  design 
software,  restricted  to  the  subcell  for  which  electrical  testing  is 
proceeding  (all  IC  layers  are  considered). 

(b)  A  fuzzy  class  membership  relation  p  for  each  string  x  of  symbols 

generated  in  Lq,  0  £  p(x)  <  1.  The  actual  value  of  p(x)  will  be 

derived  from  the  subcell  image,  as  to  enhance  defects,  while  only 
taking  into  account  those  defects  the  sizes  of  which  are  in  excess 
of  process  tolerances  and  optical  resolution. 

Example:  i)  If  a  is  a  "primitive"  etching  shape,  p(a)  could  be  pro¬ 
portional  to  the  area  of  each  such  pad  on  the  IC,  as  determined  by 
thresholding  and  counting  of  pixels.  Any  irregular  off-shots, 
blobs,  partial  bridges,  would  then  affect  the  value  of  p(a>. 
ii)  p(a)  could  be  defined  differently  for  various  values  a,  to  ac¬ 
count  for  errors  at  nodes  and  on  the  line  etching  elements. 

C c )  The  recursive  computation  of  the  degree  of  agreement  u(x)  of  the  ac¬ 
tual  subcell  x,  with  the  language  Lq,  where  x  is  assumed  to  be  gen¬ 

erated  by  a  fuzzy  language  derived  from  Lq  and  from  the  fuzzy  rela¬ 
tionship  p;  this  degree  of  agreement  is  the  figure  of  merit  for  the 
actual  layout  on  the  IC/mask  under  test. 

(B)  Grammar  Gq  C473:  Let  be  the  finite  set  of  primitives  of  the  IC/mask 
layout,  including  primitives  associated  with  easily  detectable  geometrical 
defects.  We  denote  V*  the  set  of  finite  strings  obtained  by  concatenation 
of  primitives  of  Vj,  including  the  null  string  <t>.  The  language  Lq  is  a 
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subset  of  V*,  specified  by  the  CAD  design;  it  represents  all  acceptable 
layouts  of  the  subcell.  The  elements  of  may  be  generated  by  the  gram¬ 
mar  Gq  =  (VN,VT,Po/s),  where  is  the  set  of  terminals/alphabet,  VN  is  a 
set  of  nonterminals,  s  e  V..  is  the  start  symbol,  and  P  is  the  finite  set 

of  production  rules.  The  elements  of  P  are  rewriting  rules  of  the  form 

0 

a  ♦  b,  where  a,b  are  strings  in  (VT  U  V  >*.  These  rules  are  those  by 
which  the  IC  subcell  layout  is  obtained  starting  with  a=s. 

One  important  property  of  IC's/masks,  often  overlooked  in  practice, 
is  that  the  corresponding  grammar  Gq  is  context-sensitive  for  most  circuit 
layouts.  Gq  is  said  to  be  context-sensitive  if  the  productions  are  of  the 
form  a^Aa^  ♦  ^^3^,  with  a^,  a^,  B  in  (V-j.  II  V^)*,  A  in  V^,  B  #  $,  and 
s  ♦  ♦  allowed. 

<C)  Fuzzy  grammar  C4 61  G  =  (VN,V.j.,ir,s>:  G  is  derived  from  Gq  by  replacing 

p 

Pq  by  fuzzy  rewriting  rules  defined  as  a  ♦  b,  where  p  is  the  grade  of  mem¬ 
bership  of  string  b  given  a,  or  the  figure  of  merit  of  b  given  a  as  ob¬ 
tained  from  the  subcell  image.  6  generates  a  fuzzy  language  L  for  which 
one  can  define  the  degree  of  properness  y(x)  of  any  string  xeV*,  valuating 

to  what  extent  it  is  correct  w.r.t.  G  . 

o 

(0)  Fiqure  of  merit  of  each  subcell  xeV^: 

- j  *  Pl  Pm-1 

a)  Let  a.,...,a  be  strings  in  (V_  U  V..)  ,  and  a,  *  a-,,...,a  .  ♦  a 

i  m  in  ic  m- 1  m 

be  productions  in  *  of  G.  Then  a  is  said  to  be  derivable  from  a.  in  G. 

ffl  1 

The  string  x  of  V*  representing  the  actual  IC  subcell  is  said  to  be  in  L 
iff  x  is  derivable  from  s.  The  grade  of  membership  v(x>  of  x  in  L  is  de¬ 


fined  as: 


11111111  ■  ■  1111  11111  . . .  -  ■ — — 
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u(x)  =  Sup  Min  (p(s  -*•  a.),...,p(a  *  x)) 

i  m 

where  the  supremum  is  taken  over  all  derivation  chains  from  s  to  x.  Con¬ 
sequently/  the  figure  of  merit  u(x)  of  subcell  x  is  the  degree  of  proper¬ 
ness  of  the  least  proper  link  in  the  derivation  chain  generating  the  actu¬ 
al  subcell  X/  and  u(x)  is  calculated  on  the  "best"  chain. 

(b)  G  is  said  to  be  recursive  iff  there  is  an  algorithm  which  com¬ 
putes  y(x)  recursively.  Gq  is  context-sensitive/  so  is  G.  As  it  has  been 
shown  C48H  that  a  fuzzy  context-sensitive  grammar  was  recursive,  the  fig¬ 
ure  of  merit  u(x)  of  the  subcell  x  on  the  IC  can  be  computed  recursively. 
For  details  about  the  design  of  this  algorithm,  see  £483.  We  use  it  here, 
and  apply  it  to  the  sequence  of  measured  values  pla..  *  a^+1>,  obtained 
from  the  IC  image  as  specified  in  3.5  A,  b. 

_3 .6^  Algorithm  #3 :  Attribute  Labelled  Graphs 

(A)  Principle:  In  this  section,  we  will  only  suggest  a  third  class  of  al¬ 
gorithms,  without  any  explicit  derivation  of  them.  In  the  case  of 
IC/masks,  probabilistic  deformation  mechanisms  represent  an  insufficient 
formalism  C493.  It  is  here  suggested  to  consider  instead  a  syntactically 
driven  random  field  model,  also  called  attribute  labelled  graph.  Instead 
of  looking  at  bridges  as  in  Section  3.4,  jumps  between  nodes  are  con¬ 
sidered  here,  with  each  node  having  a  label  which,  too,  may  be  distorted 
representing  a  topological  defect. 

(B)  Approach:  The  best  current  approach  is  the  proposed  error-correcting 
recognition  system  presented  in  Chapter  1  of  this  report,  and  in  C503. 
The  defects  are  modeled  as,  first,  a  syntactic  deformation  of  each  primi¬ 
tive  or  subpattern,  followed  next  by  a  local  deformation.  The  syntactic 


m 
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deformations  are  however  assumed  to  be  independent  of  the  context  and  of 
the  Local  deformation,  which  is  sometimes  inappropriate  for  defects  such 
as  short  or  open  circuits.  The  detection  is  by  an  error-correcting  iso¬ 


morphism,  and  Bayes  decision  comparing  the  original  and  final  graphs 
(Chapter  1,  and  C5Q]). 


Z^.7_  Conclusion 

This  chapter  presents  two  algorithms,  and  one  approach  for  automated 
pre-cap  visual  inspection  in  the  suggested  integrated  testing  framework. 
Although  an  experimental  validation  is  required,  this  testing  process  and 
the  associated  algorithms  are  much  more  sophisticated  than  current  prac¬ 
tices  and  should  hopefully  give  ideas  for  cost,  time  and  yield  improve¬ 
ments  in  IC  manufacturing. 
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